智能论文笔记

Randomized Quantization for Data Agnostic Representation Learning

Huimin Wu , Chenyang Lei , Xiao Sun , Peng-Shuai Wang , Qifeng Chen , Kwang-Ting Cheng , Stephen Lin , Zhirong Wu

分类：计算机视觉

2022-12-19

Self-supervised representation learning follows a paradigm of withholding some part of the data and tasking the network to predict it from the remaining part. Towards this end, masking has emerged as a generic and powerful tool where content is withheld along the sequential dimension, e.g., spatial in images, temporal in audio, and syntactic in language. In this paper, we explore the orthogonal channel dimension for generic data augmentation. The data for each channel is quantized through a non-uniform quantizer, with the quantized value sampled randomly within randomly sampled quantization bins. From another perspective, quantization is analogous to channel-wise masking, as it removes the information within each bin, but preserves the information across bins. We apply the randomized quantization in conjunction with sequential augmentations on self-supervised contrastive models. This generic approach achieves results on par with modality-specific augmentation on vision tasks, and state-of-the-art results on 3D point clouds as well as on audio. We also demonstrate this method to be applicable for augmenting intermediate embeddings in a deep neural network on the comprehensive DABS benchmark which is comprised of various data modalities. Code is availabel at http://www.github.com/microsoft/random_quantize.

translated by 谷歌翻译

Domain Adaptation for Question Answering via Question Classification

Zhenrui Yue , Huimin Zeng , Ziyi Kou , Lanyu Shang , Dong Wang

分类：自然语言处理 | 人工智能

2022-09-12

问答（QA）在回答定制域中的问题方面表现出了令人印象深刻的进展。然而，域的适应性仍然是质量检查系统最难以捉摸的挑战之一，尤其是当质量检查系统在源域中训练但部署在不同的目标域中时。在这项工作中，我们调查了问题分类对质量检查域适应的潜在好处。我们提出了一个新颖的框架：问题回答的问题分类（QC4QA）。具体而言，采用问题分类器将问题类分配给源数据和目标数据。然后，我们通过伪标记以自我监督的方式进行联合培训。为了优化，源和目标域之间的域间差异通过最大平均差异（MMD）距离降低。我们还最大程度地减少了同一问题类别的质量质量适应性表现的QA样本中的类内部差异。据我们所知，这是质量检查域适应中的第一部作品，以通过自我监督的适应来利用问题分类。我们证明了拟议的QC4QA的有效性，并在多个数据集上针对最先进的基线进行了一致的改进。

translated by 谷歌翻译

Contrastive Domain Adaptation for Early Misinformation Detection: A Case Study on COVID-19

Zhenrui Yue , Huimin Zeng , Ziyi Kou , Lanyu Shang , Dong Wang

分类：计算机视觉 | 人工智能 | 自然语言处理

2022-08-20

尽管最近在改善错误信息检测系统的性能方面取得了进展，但在看不见的领域中进行错误信息进行分类仍然是一个难以捉摸的挑战。为了解决这个问题，一种常见的方法是引入域名评论家并鼓励域不变的输入功能。但是，早期的错误信息通常证明了针对现有的错误信息数据（例如，COVID-19数据集中的类不平衡）的条件和标签转移，这使得这种方法在检测早期错误信息方面的有效性较小。在本文中，我们提出了早期错误信息检测（CANMD）的对比适应网络。具体而言，我们利用伪标签来生成高信心的目标示例，用于与源数据的联合培训。我们还设计了标签校正成分，以估算和校正源和目标域之间的标签移动（即类先验）。此外，对比度适应损失已集成在目标函数中，以减少类内部差异并扩大阶层间差异。因此，改编的模型学习了校正的类先验和两个域之间不变的条件分布，以改善目标数据分布的估计。为了证明所提出的CANMD的有效性，我们研究了Covid-19的早期错误信息检测的案例，并使用多个现实世界数据集进行了广泛的实验。结果表明，与最先进的基线相比，CANMD可以有效地将错误信息检测系统适应不见的Covid-19目标域，并有显着改进。

translated by 谷歌翻译

A Survey on Recent Advances and Challenges in Reinforcement Learning Methods for Task-Oriented Dialogue Policy Learning

Wai-Chung Kwan , Hongru Wang , Huimin Wang , Kam-Fai Wong

分类：自然语言处理

2022-02-28

对话策略学习是面向任务的对话系统（TDS）中的关键组成部分，该系统决定在每个回合处给定对话状态的系统的下一个动作。加强学习（RL）通常被选为学习对话策略，将用户作为环境和系统作为代理。已经创建了许多基准数据集和算法，以促进基于RL的对话策略的制定和评估。在本文中，我们调查了RL规定的对话政策的最新进展和挑战。更具体地说，我们确定了主要问题，并总结了基于RL的对话政策学习的相应解决方案。此外，我们通过将最新方法分类为RL中的基本元素，对将RL应用于对话政策学习的全面调查。我们认为，这项调查可以阐明对话管理未来的研究。

translated by 谷歌翻译

Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems

Jiayu Chen , Yuanxin Zhang , Yuanfan Xu , Huimin Ma , Huazhong Yang , Jiaming Song , Yu Wang , Yi Wu

分类：机器学习

2021-11-08

我们介绍了课程学习算法，变分自动课程学习（VIVL），用于解决具有挑战性的目标条件的合作多功能增强学习问题。我们通过变分的角度激励我们的范式，其中学习目标可以分解为两种术语：任务学习当前任务分发以及新任务分发的课程更新。第二任期内的本地优化表明，课程应该逐步扩展培训任务，易于努力。我们的Vivl算法用两个实际组件，任务扩展和实体进展实现了这种变分的范例，它在任务配置以及任务中的实体数量产生培训课程。实验结果表明，Vacl解决了大量代理商的稀疏奖励问题的集合。特别是，使用单个桌面机器，VACL在简单扩展的基准测试中实现了100个代理的98％覆盖率，并再现最初在Openai隐藏项目中显示的斜坡使用行为。我们的项目网站位于https://sites.google.com/view/vacl-neurips-2021。

translated by 谷歌翻译

Integrating Pretrained Language Model for Dialogue Policy Learning

Hongru Wang , Huimin Wang , Zezhong Wang , Kam-Fai Wong

分类：自然语言处理 | 人工智能

2021-11-02

强化学习（RL）已见证其培训对话政策代理人以最大限度地提高用户累计奖励的潜力。但是，奖励可以非常稀疏，它通常仅在对话会话结束时提供，这会导致可接受的对话框的无法实现的交互要求。区别于许多致力于优化策略并恢复奖励，替代地恢复了困难的奖励，这些奖励遭受了容易地陷入困境和模型崩溃，我们将对抗训练分解为两个步骤：1）我们将预先训练的语言模型集成为判别员判断当前的系统动作是否足够好，对最后一个用户操作（即，\ texit {下一个操作预测}）; 2）鉴别者给出和额外的本地密集奖励，以指导代理人的探索。实验结果表明，我们的方法显着提高了对话系统的完整速率（〜4.4 \％）和成功率（〜8.0％）。

translated by 谷歌翻译

SuperFusion: Multilevel LiDAR-Camera Fusion for Long-Range HD Map Generation and Prediction

Hao Dong , Xianjing Zhang , Xuan Jiang , Jun Zhang , Jintao Xu , Rui Ai , Weihao Gu , Huimin Lu , Juho Kannala , Xieyuanli Chen

分类：计算机视觉 | 机器人

2022-11-28

High-definition (HD) semantic map generation of the environment is an essential component of autonomous driving. Existing methods have achieved good performance in this task by fusing different sensor modalities, such as LiDAR and camera. However, current works are based on raw data or network feature-level fusion and only consider short-range HD map generation, limiting their deployment to realistic autonomous driving applications. In this paper, we focus on the task of building the HD maps in both short ranges, i.e., within 30 m, and also predicting long-range HD maps up to 90 m, which is required by downstream path planning and control tasks to improve the smoothness and safety of autonomous driving. To this end, we propose a novel network named SuperFusion, exploiting the fusion of LiDAR and camera data at multiple levels. We benchmark our SuperFusion on the nuScenes dataset and a self-recorded dataset and show that it outperforms the state-of-the-art baseline methods with large margins. Furthermore, we propose a new metric to evaluate the long-range HD map prediction and apply the generated HD map to a downstream path planning task. The results show that by using the long-range HD maps predicted by our method, we can make better path planning for autonomous vehicles. The code will be available at https://github.com/haomo-ai/SuperFusion.

translated by 谷歌翻译

Few-shot Image Generation with Diffusion Models

Jingyuan Zhu , Huimin Ma , Jiansheng Chen , Jian Yuan

分类：计算机视觉

2022-11-07

Denoising diffusion probabilistic models (DDPMs) have been proven capable of synthesizing high-quality images with remarkable diversity when trained on large amounts of data. However, to our knowledge, few-shot image generation tasks have yet to be studied with DDPM-based approaches. Modern approaches are mainly built on Generative Adversarial Networks (GANs) and adapt models pre-trained on large source domains to target domains using a few available samples. In this paper, we make the first attempt to study when do DDPMs overfit and suffer severe diversity degradation as training data become scarce. Then we propose to adapt DDPMs pre-trained on large source domains to target domains using limited data. Our results show that utilizing knowledge from pre-trained DDPMs can significantly accelerate convergence and improve the quality and diversity of the generated images. Moreover, we propose a DDPM-based pairwise similarity loss to preserve the relative distances between generated samples during domain adaptation. In this way, we further improve the generation diversity of the proposed DDPM-based approaches. We demonstrate the effectiveness of our approaches qualitatively and quantitatively on a series of few-shot image generation tasks and achieve results better than current state-of-the-art GAN-based approaches in quality and diversity.

translated by 谷歌翻译

DPNet: Dual-Path Network for Real-time Object Detection with Lightweight Attention

Quan Zhou , Huimin Shi , Weikang Xiang , Bin Kang , Xiaofu Wu , Longin Jan Latecki

分类：计算机视觉

2022-09-28

压缩高准确性卷积神经网络（CNN）的最新进展已经见证了实时对象检测的显着进步。为了加速检测速度，轻质检测器总是使用单路主链几乎没有卷积层。但是，单路径架构涉及连续的合并和下采样操作，始终导致粗糙和不准确的特征图，这些图形不利，无法找到对象。另一方面，由于网络容量有限，最近的轻质网络在表示大规模的视觉数据方面通常很弱。为了解决这些问题，本文提出了一个名为DPNET的双路径网络，并采用了实时对象检测的轻巧注意方案。双路径体系结构使我们能够与提取物相对于高级语义特征和低级对象详细信息。尽管DPNET相对于单路检测器几乎具有重复的形状，但计算成本和模型大小并未显着增加。为了增强表示能力，轻巧的自相关模块（LSCM）旨在捕获全局交互，只有很少的计算开销和网络参数。在颈部，LSCM扩展到轻质互相关模块（LCCM），从而捕获相邻尺度特征之间的相互依赖性。我们已经对Coco和Pascal VOC 2007数据集进行了详尽的实验。实验结果表明，DPNET在检测准确性和实施效率之间实现了最新的权衡。具体而言，DPNET在MS COCO Test-DEV上可实现30.5％的AP，Pascal VOC 2007测试集上的81.5％地图，MWITH近250万型号，1.04 GFLOPS，1.04 GFLOPS和164 fps和196 fps和196 fps，320 x 320输入图像的320 x 320输入图像。

translated by 谷歌翻译

ScaleFormer: Revisiting the Transformer-based Backbones from a Scale-wise Perspective for Medical Image Segmentation

Huimin Huang , Shiao Xie1 , Lanfen Lin , Yutaro Iwamoto , Xianhua Han , Yen-Wei Chen , Ruofeng Tong

分类：计算机视觉

2022-07-29

最近，已经开发了各种视觉变压器作为对远程依赖性建模的能力。在当前的基于变压器的主骨用于医疗图像分割的骨架中，卷积层被纯变压器替换，或者将变压器添加到最深的编码器中以学习全球环境。但是，从规模的角度来看，主要有两个挑战：（1）尺度内问题：在每个尺度中提取局部全球线索所缺乏的现有方法，这可能会影响小物体的信号传播；（2）尺度间问题：现有方法未能从多个量表中探索独特的信息，这可能会阻碍表示尺寸，形状和位置广泛的对象的表示形式学习。为了解决这些局限性，我们提出了一个新颖的骨干，即比例尺形式，具有两个吸引人的设计：（1）尺度上的尺度内变压器旨在将基于CNN的本地功能与每个尺度中的基于变压器的全球线索相结合，在行和列的全局依赖项上可以通过轻巧的双轴MSA提取。（2）一种简单有效的空间感知尺度变压器旨在以多个尺度之间的共识区域相互作用，该区域可以突出跨尺度依赖性并解决复杂量表的变化。对不同基准测试的实验结果表明，我们的尺度形式的表现优于当前最新方法。该代码可公开可用：https：//github.com/zjugivelab/scaleformer。

translated by 谷歌翻译